Determination of the optimum definition of growth evaluation for indeterminate pulmonary nodules detected in lung cancer screening

Objective To determine the optimum definition of growth for indeterminate pulmonary nodules detected in lung cancer screening. Materials and methods Individuals with indeterminate nodules as defined by volume of 50–500 mm3 (solid nodules) and solid component volume of 50–500 mm3 or average diameter of non-solid component ≥8 mm (part-solid nodules) on baseline lung cancer screening low-dose chest CT (LDCT) were included. The average diameters and volumes of the nodules were measured on baseline and follow-up LDCTs with semi-automated segmentation. Sensitivities and specificities for lung cancer diagnosis of nodule growth defined by a) percentage volume growth ≥25% (defined in the NELSON study); b) absolute diameter growth >1.5 mm (defined in the Lung-RADS version 1.1); and c) subjective decision by a radiologist were evaluated. Sensitivities and specificities of diagnostic referral based on various thresholds of volume doubling time (VDT) were also evaluated. Results Altogether, 115 nodules (one nodule per individual; 93 solid and 22 part-solid nodules; 105 men; median age, 68 years) were evaluated (median follow-up interval: 201 days; interquartile range: 127–371 days). Percentage volume growth ≥25% exhibited higher sensitivity but lower specificity than those of diametrical measurement compared to absolute diameter growth >1.5 mm (sensitivity, 69.2% vs. 42.3%, p = 0.023; specificity, 82.0% vs. 96.6%, p = 0.002). The radiologist had an equivalent sensitivity (53.9%; p = 0.289) but higher specificity (98.9%; p = 0.002) compared to those of volume growth, but did not differ from those of diameter growth (p>0.05 both in sensitivity and specificity). Compared to the VDT threshold of 600 days (sensitivity, 61.5%; specificity, 87.6%), VDT thresholds ≤200 and ≤300 days exhibited significantly lower sensitivity (30.8%, p = 0.013) and higher specificity (94.4%, p = 0.041), respectively. Conclusion Growth evaluation of screening-detected indeterminate nodules with volumetric measurement exhibited higher sensitivity but lower specificity compared to diametric measurements.

Positive baseline screening LDCT results are defined based on the size and consistency of pulmonary nodules, and participants with indeterminate pulmonary nodules underwent follow-up LDCTs [5][6][7]. Meanwhile, in the follow-up LDCTs, the presence of nodule growth and growth rate are key components for defining positive results that require invasive diagnostic procedures [4,8,9]. Conventionally, growth assessment of a pulmonary nodule is based on uni-or bi-dimensional diametrical measurement [10][11][12], and the lung CT screen reporting and data system (Lung-RADS) from the American College of Radiology defines nodule growth as an absolute increase in average diameter >1.5 mm [13]. Meanwhile, volumetric measurement is expected to detect nodule growth more sensitively and reduce inter-and intra-reader variability [12,14]. In the Dutch-Belgian lung cancer screening trial (NELSON), the growth of nodules was defined as a relative increase in nodule volume greater than 25% and a volume doubling time (VDT) <400 days indicated positive results necessitating a diagnostic referral [4,8,9].
However, there is limited research on the definition of nodule growth and the growth rate is optimal for identifying lung cancer among indeterminate-sized pulmonary nodules, defined as volume of 50-500 mm 3 (solid nodules) and solid component volume of 50-500 mm 3 or average diameter of non-solid component �8 mm (part-solid nodules), detected in baseline screening LDCTs. Therefore, we aimed to evaluate the diagnostic accuracy of different criteria (i.e., diametrical measurement, volumetric measurement, and subjective decision by a radiologist) for growth and diagnostic referral of indeterminate pulmonary nodules for lung cancer in lung cancer screening.

Materials and methods
The Institutional Review Board of Seoul National University Hospital approved this study and waived the requirement for informed consent from the patients.

Study population
We enrolled the study population from two consecutive cohorts: (a) participants of The Korean Lung Cancer Screening (K-LUCAS) project enrolled in our institution between 2017 and 2018 [15,16]; and (b) subjects who underwent screening LDCTs for a health check-up at our institution and were consequently diagnosed with lung cancer between 2011 and 2019. The common inclusion criteria for the two cohorts were: (a) Individuals with indeterminate baseline LDCT results defined by the criteria of NELSON (i.e., solid nodules with a volume 50-500 mm 3 ; part-solid nodules with a volume of solid component 50-500 mm 3 or average diameter of non-solid component �8 mm; non-solid nodule with average diameter �8 mm) [4,9]; (b) individuals with follow-up LDCT for nodules detected on baseline LDCT; and (c) individuals with solid or part-solid nodules, not non-solid nodule. Individuals with nodules that disappeared in the follow-up LDCT in the K-LUCAS project were excluded (Fig 1). In this study, we defined lung cancers as pathologically proven pulmonary nodules and otherwise regarded as benign nodules.
For individuals with two or more nodules on baseline LDCTs, one dominant nodule was selected with the following criteria because the most suspicious nodule guides the patients' management strategy [11]: (a) the nodule with the largest volume (volume of solid component for part-solid nodules) was selected and (b) solid nodules were accorded priority over partsolid nodules.
Consequently, we included 115 indeterminate nodules from 115 individuals in this study.

LDCT examination
CT examinations were performed using one of the nine different scanners from four manufacturers (

Nodule measurement
To evaluate nodule size, a thoracic radiologist (E.J.H., 11-year experience in chest CT interpretation) measured the indeterminate pulmonary nodules on the baseline and follow-up LDCTs using a commercial software (A-view LungScreen, Coreline Soft). By designating the target nodule by the user, the software automatically segmented the boundary of the nodule (separate segmentation of ground-glass component and solid component for part-solid nodules). If segmentation by the software is not judged to be appropriate by the user, users can adjust the segmentation manually. Subsequently, the software provided the maximum average diameter measured on the transverse plane (average diameter, hereafter) and volume of the target nodule based on the segmentation results (Fig 2). Indeed, this software was implemented during the first year of The K-LUCAS project, and attending thoracic radiologists in the participating institutions read lung cancer screening LDCT using this software [16][17][18].
To evaluate inter-reader agreement of the semi-automated nodule measurement, 14 randomly sampled nodules (solid nodules, n = 8; part-solid nodules, n = 6) were independently

Evaluation of nodule growth
To evaluate nodule growth, the following metrics were evaluated for each nodule based on the semi-automated measurement (separate evaluation of the whole nodule including ground- VDTs were evaluated in nodules that exhibited a percentage volume growth �25%, as suggested by the NELSON and European position statement [4,8,9,19]. For subjective evaluation of nodule growth, one thoracic radiologist (J.H.L.) and one general radiologist (W. H. L., 7 years of experience in chest CT interpretation) who was blinded to the diagnosis of lung cancer reviewed baseline and follow-up LDCTs and decided whether (a) there was any growth of the nodule and (b) diagnostic process other than follow-up LDCT is required (diagnostic referral, hereafter). For the subjective evaluation of nodule growth, measurement of nodules using an electronic caliper was allowed, but the semi-automated measurement results, including nodule volume, were not provided. Inter-reader agreement was evaluated between the interpretation of the two radiologists, while only interpretation by one radiologist (J.H.L.) was used for the performance evaluation.
To evaluate the diagnostic performance for lung cancer in different definitions of nodule growth, we evaluated three different definitions of nodule growth: (a) Percentage volume growth of �25% (as defined in the NELSON) [4,8,9]; (b) absolute diameter growth of >1.5 mm (per the Lung-RADS) [13]; and (c) any growth defined by the subjective evaluation of the radiologist. For part-solid nodules, growth of either the ground-glass or solid components was considered growth.
We evaluated the diagnostic performance of different thresholds of VDT (VDT of 600, 500, 400, 300, 200, and 100 days) and the radiologists' subjective diagnostic referrals for diagnosing lung cancer.

Statistical analysis
To evaluate the diagnostic performance for lung cancer of each growth metric (i.e., percentage volume growth, absolute diameter growth, and VDT), we performed receiver-operating characteristic curve analyses, and area under the receiver-operating characteristic curves (AUCs) were obtained. Sensitivity and specificity were obtained for diagnostic performance at specific thresholds of growth and diagnostic referral. Comparison of AUCs was performed using the method suggested by DeLong, while comparisons of sensitivities and specificities were performed using McNemar tests. Subgroup analyses were performed in the same manner, with nodules presenting as solid nodules on baseline screening CTs.
Inter-reader agreement for semi-automated lung nodule measurement was evaluated using the interclass correlation coefficient and Bland-Altman plots [20,21], while inter-reader agreements for nodule growth and diagnostic referral by radiologists' subjective interpretation were evaluated with percentage agreement and Cohen's kappa coefficient [22].
All statistical analyses were performed using Medcalc version 20.009 (MedCalc Software Ltd) and R version 4.1.0 (R Project for Statistical Computing), and a p-value of <0.05 was considered to indicate statistical significance.

Baseline characteristics
Of the 115 individuals included in the study (men, n = 115; women, n = 10; median age 68 years, interquartile range [IQR], 63-72 years), 46 were current smokers, 59 were former smokers, and 10 were non-smokers. In 112 individuals with pack-year information, the median smoking burden was 40 pack-years (IQR, 30-43 pack-years). The median time interval between baseline and follow-up LDCT was 201 days (IQR, 127-371 days). Of the 115 nodules, 93 were solid nodules, 22 were part-solid nodules, and 26 were lung cancers (22.6%, solid nodules, n = 11; part-solid nodules, n = 15). The average diameter and volume of the 115 nodules in the baseline and follow-up LDCT are tabulated in Table 1.

Nodule growth and lung cancer diagnosis
The percentage volume growth and absolute diameter growth based on semi-automated nodule measurements are summarized in Table 1 Table).

Volume doubling time and lung cancer diagnosis
Among the 25 nodules that exhibited volume growth �25%, average VDT ± standard deviation was 260.  (Fig 3). The sensitivities and specificities of the different VDT thresholds are described in Table 3   , while VDT thresholds of 300 days or shorter exhibited significantly higher specificity than VDT threshold of 600 days.
In the subgroup analysis with solid nodules only, the AUC of VDT for lung cancer diagnosis was 0.867 (95% CI: 0.725-1.000) (S1 Fig). The sensitivities and specificities of the different VDT thresholds are described in S2 Table. The added value of diagnostic referral by subjective interpretation of the radiologist in all nodules and solid nodules are described in S3

Inter-reader agreement
Regarding inter-reader agreement for semi-automated measurement of pulmonary nodules (n = 14), the average diameter and volume of the nodules showed inter-class correlation coefficients of 0.773 (95% CI: 0.509-0.895) and 0.878 (95% CI: 0.736-0.944), respectively. In Bland-Altman plots (Fig 4), 95% limit of agreement for the average diameter of nodules was -2.9-2.6 mm, while that for nodule volume was -85.5 to 107.4 mm 3 . For growth evaluation of these 14 nodules, Cohen's kappa coefficient and percentage agreement were 0.696 (95% CI: 0.324-1.000) and 85.7% for percentage volume growth �25%, 0.054 (95% CI: -0.477-0.585) and  Bland-Altman plots for agreement (A) between mean diameters measured by two radiologists and difference of diameters between the two radiologists (the 95% limit of agreement was between -2.9mm to 2.6mm), and (B) between mean volumes measured by two radiologists and difference of volumes between the two radiologists (the 95% limit of agreement was between -85.5mm 3 and 107.4mm 3 ).

Discussion
In our study, growth of screening-detected indeterminate pulmonary nodules defined as percentage volume growth �25% exhibited higher sensitivity and lower specificity for the diagnosis of lung cancer compared to the growth defined as absolute diameter growth >1.5 mm (sensitivity, 69.2% vs. 42.3%; specificity, 82.0% vs. 96.6%). Regarding diagnostic referral based on VDT, thresholds �200 and �300 days exhibited significantly lower sensitivity (30.8%) and higher specificity (94.4%) than those with a VDT threshold of 600 days (sensitivity, 61.5%; specificity, 87.6%), respectively.
The major advantage of volumetric measurement of pulmonary nodules is that they can sensitively detect nodule growth [12,14]. The 25% threshold has been considered the margin of measurement variability [23,24] and has been adopted in several European lung cancer screening trials [4,25,26]. The diametric changes of up to 1.5 or 2 mm are usually regarded as measurement variability [10,11,13]. Concordant with this, our results suggest that volumetric measurement of lung nodules with a threshold of percentage volume growth �25% can detect early lung cancer more sensitively compared to diametrical measurement (69.2% vs. 42.3%), although their AUCs were not significantly different (0.812 vs. 0.810). However, the percentage volume growth �25% showed lower specificity than absolute diameter growth >1.5 mm (82.0% vs. 96.6%) in our study. In other words, nodule growth defined by volumetry can lead to false-positive detection of growth in benign nodules, which can be due to measurement variability or true growth of benign nodules.
To reduce false-positive nodule growth, further evaluation of the growth rate of nodules is required. Previous studies have reported a relatively wide range of VDTs for lung cancers, ranging from 100-600 days [11]. Indeed, considering the VDT of lung cancers, previous lung cancer screening trials have adopted a VDT threshold of 400 days for diagnostic referral [4,27], while the European position statement suggested a more conservative threshold of 600 days [19]. In our study, the 600-day VDT threshold exhibited sensitivity and specificity of 61.5% and 87.6%, respectively, and VDT thresholds <600 days resulted in lower sensitivity and higher specificity. All VDT thresholds showed lower sensitivities than diagnostic referrals based on the subjective decision of the radiologist, suggesting that a substantial proportion of lung cancer patients might undergo diagnostic delays in VDT-based diagnostic referral only. This could be due to pulmonary adenocarcinomas appearing as part-solid nodules, which usually show relatively longer VDTs [11]. Indeed, in our study, 17 of 26 lung cancers (65.4%; solid nodules, n = 8; part-solid nodules, n = 9) had VDTs ranging from 100-600 days, the other 9 lung cancers (34.6%) had VDTs shorter than 100 days (part-solid nodule, n = 1) or longer than 600 days (solid, n = 3; part-solid nodules, n = 5). Reflecting this, in a subgroup analysis with only solid nodules, a 600-day VDT threshold exhibited slightly higher sensitivity than the subjective decision by the radiologist (72.7% vs. 63.6%).
It would be difficult to define an optimum VDT threshold because there is a trade-off between the benefit of sensitive detection of early lung cancer and the cost of unnecessary diagnostic referral or invasive procedures. A previous study by Heuvelmans et al. suggested a 232-day VDT threshold for the identification of lung cancers in three-month follow-up LDCTs [28]. However, in our study, VDT thresholds �200 days led to a substantial reduction in sensitivity. Nonetheless, adding subjective decisions for diagnostic referral by a radiologist can improve the balance between sensitivity and specificity. In our study, the VDT threshold of 200 days with combined radiologist adjudication for diagnostic referral resulted in the same sensitivity at higher specificity compared with the VDT threshold of 600 days.
Reduced inter-or intra-reader variability is another key advantage of volumetric lung nodule measurements using segmentation. Simple diametric measurement or volume measurement by simply using tumor diameter cannot reflect the three-dimensional nature of pulmonary nodules and therefore are prone to inter-or intra-reader variability [10,12,29]. Concordant with these previous studies [10,12,14,29,30], our results corroborated a higher kappa coefficient value (0.696) and percentage agreement (85.7%) for nodule growth with volumetric measurement than those of diametric measurement (kappa coefficient = 0.054, percentage agreement = 64.3%) and subjective radiologists' adjudication (kappa coefficient = 0.276, percentage agreement = 57.1%).
This study had several limitations. First, the retrospective nature and relatively small study population might have limited our study's results. For example, heterogeneity of CT scanners or protocols, even the baseline and follow-up CT in one individual, could affect the result, but this was inevitable due to the retrospective nature of this study. To overcome this limitation, a prospective study with uniform CT scanners and protocols should be warranted. Alternatively, applying state-of-the-art techniques such as the deep learning-based image reconstruction kernel conversion model can be helpful. Second, a diagnostic case-control study, in which researchers collect disease-positive and disease-negative cases through convenience sampling, cannot reflect real-world screening settings from the perspective of unrealistic disease prevalence. Indeed, this study has a selection bias in that two cohorts with different characteristics were included. That is, while individuals in the K-LUCAS project were included regardless of whether lung cancer was diagnosed, individuals in our institution were included if they were diagnosed with lung cancer. Third, although inter-observer variabilities were investigated in measuring average diameter and volume, and subjective assessment, only two radiologists participating in this study may limit the generalizability of the results. Furthermore, because various types of commercially available segmentation software can affect the measurement and classification of nodules [12], we used only a single software.
In conclusion, growth evaluation of screening-detected indeterminate nodules with volumetric measurement exhibited higher sensitivity but lower specificity compared to diametric measurements.
Supporting information S1 Table. Comparison of diagnostic performance for lung cancer diagnosis between growth adjudication of volumetric and diametric measurements and subjective radiologist's assessment in 93 solid nodules.